Robust L1 orthogonal regression

نویسندگان

J. Paul Brooks

Edward L. Boone

چکیده

Assessing the linear relationship between a set of continuous predictors and a continuous response is a well studied problem in statistics and is applied in many data mining situations. L2 based methods such as ordinary least squares and principal components regression can be used to determine this relationship. However, both of these methods become impaired when multicollinearity is present. This problem becomes compounded when outliers confound standard multicollinearity diagnostics. This work proposes a L1 orthogonal regression method (L1OR) formulated as a nonconvex optimization problem. Solution strategies for finding globally optimal solutions are presented. A simulation study is conducted to determine the robustness of the method to outliers which shows that L1OR is superior to ordinary least squares regression (OLS) and principal components regression (PCR) and is competitive with M-regression (M-R) in the presence of outliers. The new method outperforms OLS, PCR, and M-R on data from an environmental application. Introduction and Background Data miners are often posed with the problem of determining the relationship between several variables and a response variable. Standard techniques available are ordinary least squares regression (OLS), principle components regression (PCR), and non-parametric and semi-parametric regression. When outliers, or unusual observations, are present in data, these regression techniques become impaired. Methods such as M-Regression (M-R) use M estimates reduce the impact of outliers. These methods are not designed for developing errors-in-variables models in which both the predictors and the response have measurement error or are considered random components. An example of this would be measuring pH and Alkalinity in the field which usually have measurement error. Orthogonal regression (OR) is used when uncertainty is known to be present in both independent and dependent variables. In contrast with OLS, where residuals are measured as the vertical distance of observations to the fitted surface, residuals are measured by the orthogonal distances to the fitted surface. Previous Work on Robust Orthogonal Regression The sensitivity of OR to outliers has been noted, and other investigators have worked to develop robust methods (Brown, 1982; Carroll and Gallo, 1982; Zamar, 1989). The work of Zamar (Zamar, 1989) includes the use of S and M estimates for OR. Orthogonal regression can be formulated as equivalent to finding the last principal component, or the direction of minimum variation, in principal component analysis (PCA). Hence, any robust PCA method can be used for robust orthogonal regression. The two main approaches for robust PCA are (1) to find robust estimates of the covariance matrix (in traditional PCA, the principal components are eigenvectors of the covariance matrix) and (2) to use a robust measure of dispersion. Research in the former area includes (Campbell, 1980; Devlin et al, 1981; Galpin and Hawkins, 1987; Naga, 1990; Marden, 1999; Croux and Haesbroeck, 2000; Kamiya and Eguchi, 2001). The latter approach coincides with the work presented here. Robust estimates of dispersion in PCA have been investigated in (Li and Chen, 1985; Xie et al, 1993; Maronna, 2005). Traditional Orthogonal Regression Suppose we are given observations with continuous responses (xi, yi) ∈ Rd × R, i = 1, . . . , n. OR seeks to find an orthogonal projection of the data onto a hyperplane such that the orthogonal distances of the points (xi, yi) to the hyperplane is minimized. We assume throughout this work that the means have been subtracted from samples so that the fitted hyperplane passes through the origin. In OR, the sum of squared orthogonal distances of (xi, yi) to the hyperplane defined by bT (x, y) = 0 is minimized. Finding b involves first optimizing the problem

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fitting Two Concentric Circles and Spheres to Data by l1 Orthogonal Distance Regression

The problem of fitting two concentric circles and spheres to data arise in computational metrology. The most commonly used criterion for this is the Least Square norm. There is also interest in other criteria, and here we focus on the use of the l1 norm, which is traditionally regarded as important when the data contain wild points. A common approach to this problem involves an iteration proces...

متن کامل

L1-norm Penalised Orthogonal Forward Regression

A l-norm penalized orthogonal forward regression (l-POFR) algorithm is proposed based on the concept of leaveone-out mean square error (LOOMSE). Firstly, a new l-norm penalized cost function is defined in the constructed orthogonal space, and each orthogonal basis is associated with an individually tunable regularization parameter. Secondly, due to orthogonal computation, the LOOMSE can be anal...

متن کامل

Least Squares Optimization with L1-Norm Regularization

This project surveys and examines optimization approaches proposed for parameter estimation in Least Squares linear regression models with an L1 penalty on the regression coefficients. We first review linear regression and regularization, and both motivate and formalize this problem. We then give a detailed analysis of 8 of the varied approaches that have been proposed for optimizing this objec...

متن کامل

Outlier-Resistant L1 Orthogonal Regression via the Reformulation-Linearization Technique

Assessing the linear relationship between a set of continuous predictors and a continuous response is a well-studied problem in statistics and data mining. L2-based methods such as ordinary least squares and orthogonal regression can be used to determine this relationship. However, both of these methods become impaired when influential values are present. This problem becomes compounded when ou...

متن کامل

A robust autoregressive gaussian process motion model using l1-norm based low-rank kernel matrix approximation

This paper considers the problem of modeling complex motions of pedestrians in a crowded environment. A number of methods have been proposed to predict the motion of a pedestrian or an object. However, it is still difficult to make a good prediction due to challenges, such as the complexity of pedestrian motions and outliers in a training set. This paper addresses these issues by proposing a ro...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Robust L1 orthogonal regression

نویسندگان

چکیده

منابع مشابه

Fitting Two Concentric Circles and Spheres to Data by l1 Orthogonal Distance Regression

L1-norm Penalised Orthogonal Forward Regression

Least Squares Optimization with L1-Norm Regularization

Outlier-Resistant L1 Orthogonal Regression via the Reformulation-Linearization Technique

A robust autoregressive gaussian process motion model using l1-norm based low-rank kernel matrix approximation

عنوان ژورنال:

اشتراک گذاری